Image Caption Generator Based On Deep Neural Networks

نویسندگان

  • Jianhui Chen
  • Wenqiang Dong
  • Minchen Li
چکیده

In this project, we systematically analyze a deep neural networks based image caption generation method. With an image as the input, the method can output an English sentence describing the content in the image. We analyze three components of the method: convolutional neural network (CNN), recurrent neural network (RNN) and sentence generation. By replacing the CNN part with three state-of-the-art architectures, we find the VGGNet performs best according to the BLEU score. We also propose a simplified version the Gated Recurrent Units (GRU) as a new recurrent layer, implementing by both MATLAB and C++ in Caffe. The simplified GRU achieves comparable result when it is compared with the long short-term memory (LSTM) method. But it has few parameters which saves memory and is faster in training. Finally, we generate multiple sentences using Beam Search. The experiments show that the modified method can generate captions comparable to the-state-of-the-art methods with less training memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic-Specific Image Caption Generation

Recently, image caption which aims to generate a textual description for an image automatically has attracted researchers from various fields. Encouraging performance has been achieved by applying deep neural networks. Most of these works aim at generating a single caption which may be incomprehensive, especially for complex images. This paper proposes a topic-specific multi-caption generator, ...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

Keyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks

This paper presents the modeling approaches performed by the FHDO Biomedical Computer Science Group (BCSG) for the caption prediction task at ImageCLEF 2017. The goal of the caption prediction task is to recreate original image captions by detecting the interplay of present visible elements. A large-scale collection of 164,614 biomedical images, represented as imageID caption pairs, extracted f...

متن کامل

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Caption generation has long been seen as a difficult problem in Computer Vision and Natural Language Processing. In this paper, we present an image captioning model based on a end-to-end neural framework that combines Convolutional Neural Network and Recurrent Neural Network. Critical to our approach is a ranking objective that attempts to add discriminatory power to the model. Experiments on M...

متن کامل

Automated Image Captioning Using Nearest-Neighbors Approach Driven by Top-Object Detections

The significant performance gains in deep learning coupled with the exponential growth of image and video data on the Internet have resulted in the recent emergence of automated image captioning systems. Two broad paradigms have emerged in automated image captioning, i.e., generative model-based approaches and retrieval-based approaches. Although generative model-based approaches that use the r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016